A comparative study of sample selection methods for classification

نویسندگان

  • Patricia E. N. Lutu
  • Andries Petrus Engelbrecht
چکیده

Sampling of large datasets for data mining is important for at least two reasons. The processing of large amounts of data results in increased computational complexity. The cost of this additional complexity may not be justifiable. On the other hand, the use of small samples results in fast and efficient computation for data mining algorithms. Statistical methods for obtaining sufficient samples from datasets for classification problems are discussed in this paper. Results are presented for an empirical study based on the use of sequential random sampling and sample evaluation using univariate hypothesis testing and an information theoretic measure. Comparisons are made between theoretical and empirical estimates.

منابع مشابه

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

A Comparative Study of SVM and RF Methods for Classification of Alteration Zones Using Remotely Sensed Data

Identification and mapping of the significant alterations are the main objectives of the exploration geochemical surveys. The field study is time-consuming and costly to produce the classified maps. Therefore, the processing of remotely sensed data, which provide timely and multi-band (multi-layer) data, can be substituted for the field study. In this study, the ASTER imagery is used for altera...

متن کامل

SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy

 In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....

متن کامل

Negative Selection Based Data Classification with Flexible Boundaries

One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...

متن کامل

Provide a Model for Shaping the Subject in Comparative Studies and Research in the Field of Art With Emphasis on Interdisciplinary Studies

Consideration of comparative research as a "separate and different research process" is an issue that has not been addressed thoroughly, at least in Iran, and few of the research conducted under the title of "comparative" refer to studies conducted using different methods than the usual research methods. On the other hand, there has been a rise in the importance of interaction between different...

متن کامل

Comparative Approach to the Backward Elimination and for-ward Selection Methods in Modeling the Systematic Risk Based on the ARFIMA-FIGARCH Model

The present study aims to model systematic risk using financial and accounting variables. Accordingly, the data for 174 companies in Tehran Stock Exchange are extracted for the period of 2006 to 2016. First, the systematic risk index is estimated using the ARFIMA-FIGARCH model. Then, based on the research background, 35 affective financial and accounting variables are simultaneously used with t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:
  • South African Computer Journal

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2006